Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use_file: Remove use of spin-locks #125

Merged
merged 3 commits into from
Jan 7, 2020
Merged

Conversation

josephlr
Copy link
Member

@josephlr josephlr commented Jan 2, 2020

Remove the general purpose spin-lock from getrandom, and don't spin
when polling /dev/random. We also remove the use of spin locks when
opening the persistent fd for platforms that require it.

For both these cases, we can just use the pthread lock/unlock methods
in libc. We also do some minor cleanup to better make use of Result
types and DropGuards.

Essentially, this change does the "standard" Mutex based synchronization approach, it just has to do it without libstd. With this change, we continue to have the property that getrandom uses at most one file descriptor per process.

EDIT: Updated description to indicate that we are removing all uses of spin locks.

Thanks to @matklad for pointing this out. See his blog post about getrandom and matklad/once_cell#61.

Signed-off-by: Joe Richey joerichey@google.com

@josephlr josephlr requested review from newpavlov and dhardy January 2, 2020 08:38
Copy link

@matklad matklad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine, as the open() syscall will never block when opening a device

This is better, but, in theory, and if you are really unlucky, can still lead to a priority inversion, if the thread is preempted. Non-blocking massively reduces chances of preemption, but does not make them zero. Moreover, we are doing a syscall and I have vague recollections that the kernel uses syscalls as a chance to check if the quant is exhausted, so preemtion probability might actually be higher than in the uniformly distributed case. So, this is likely ok, but not almost surely ok :)

src/use_file.rs Outdated Show resolved Hide resolved
@josephlr
Copy link
Member Author

josephlr commented Jan 2, 2020

This is fine, as the open() syscall will never block when opening a device

This is better, but, in theory, and if you are really unlucky, can still lead to a priority inversion, if the thread is preempted.

I agree, after looking over the code, I think removing spinning entirely is the best bet (and also makes the code more readable). In the very rare case we race on the open call, it's fine to open /dev/urandom twice and then just close one.

@matklad
Copy link

matklad commented Jan 2, 2020

It just occurred to me that if we are using the open syscall here then we ... can just call pthread_lock as well?

@josephlr
Copy link
Member Author

josephlr commented Jan 2, 2020

It just occurred to me that if we are using the open syscall here then we ... can just call pthread_lock as well?

I looked into doing that originally, and then again when I was fixing this issue. It seems much more complicated to do than the current code. There are a bunch of pthread methods you have to call (or at least the stdlib calls them, I'm not sure the complexity (vs this PR) is worth it.

@matklad
Copy link

matklad commented Jan 2, 2020

There are a bunch of pthread methods you have to call (or at least the stdlib calls them

stdlib needs to guard agains reentrant locking, which we don't have to, so the implementation is pretty straight forward (modulo the lack of UnsafeSyncCell):

https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=053aa21299f80833e3ee779a05bf55c5

I like it because it doges the other edge case -- unbounded number of concurrently opened file descriptors (not that my assertion from the blog that an init functions would be run at most #cores times is wrong).

@matklad
Copy link

matklad commented Jan 2, 2020

stdlib needs to guard agains reentrant locking, which we don't have to

Well, which I think we don't need to do, for the following reasons:

  • we don't run any user code while holding the mutex (in particular, no closure, generics or custom Drops are in sight)
  • so the only reason why we might reenter the function is signal handlers, but we don't advertise getrandom as async safe

@gnzlbg
Copy link

gnzlbg commented Jan 2, 2020

I like it because it doges the other edge case -- unbounded number of concurrently opened file descriptors

I was just going to mention this. File-descriptors are a finite resource on most OSes, and I'd be more comfortable with a solution that does not risk exhausting them in pathological cases.

@josephlr josephlr changed the title use_file: Avoid use of spin-locks use_file: Remove use of spin-locks Jan 2, 2020
@josephlr
Copy link
Member Author

josephlr commented Jan 3, 2020

@matklad @gnzlbg I think you are both right. The code to use pthreads wasn't that bad (after I added a DropGuard helper type) and it avoids some weird pathological edge cases.

Let me know what you think.

// numbers. The file will be opened exactly once. All successful calls will
// return the same file descriptor. This file descriptor is never closed.
fn get_rng_fd() -> Result<libc::c_int, Error> {
static FD: AtomicUsize = AtomicUsize::new(LazyUsize::UNINIT);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how often this needs to happen, or whether this happening is worth optimizing, but a different way to implement this is to do something like this to keep the common path branchless: /~https://github.com/calebzulawski/multiversion/pull/6/files#diff-0a878480aac95b54dd822d02c4ad345eR76

cc @TethysSvensson - it might be a thing worth attempting in a subsequent PR after an approach that works "correctly" gets merged.

Copy link

@TethysSvensson TethysSvensson Jan 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gnzlbg The reason why that trick works in multiversion is, that the value we are trying to lazily compute is already a function pointer. This means that the best-case performance is always going to involve at least 1 indirect function call. The trick makes sure that the common path does not pay anything in addition to that 1 indirect function.

Here it looks like you are trying to lazily compute a file descriptor. As far as I can tell, the current cost in the common path is a single conditional branch on a relaxed load. This appears to me to be cheaper than alternative cost of 1 indirect function call on a relaxed load.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do have one suggestion though, assuming that you would ever want to inline get_rng_fd. I think in that case, you would want to inline the initial check, while keeping the slow initializer un-inlined. I am imagining something like this:

#[inline]
fn get_rng_fd() -> Result<libc::c_int, Error> {
    static FD: AtomicUsize = AtomicUsize::new(LazyUsize::UNINIT);

    if let Some(fd) = get_fd() {
        Ok(fd)
    } else {
        initialize_slow()
    }

    #[inline(always)]
    fn get_fd() -> Option<libc::c_int> { ... }
    #[inline(always)]
    fn initialize_slow() -> Result<libc::c_int, Error> { ... }
}

However if I understand this code correctly, the I don't think this function is meant to be inlined(?), so in that case it does not really matter either way.

Copy link
Member Author

@josephlr josephlr Jan 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However if I understand this code correctly, the I don't think this function is meant to be inlined(?), so in that case it does not really matter either way.

This is correct, the only external function of this crate is getrandom::getrandom which is not marked #[inline], so I don't think marking internal functions #[inline] really does anything.

In fact it looks like (in release mode) the compiler just inlines everything in this file into use_file::getrandom_inner. I checked with the current nightly build.

EDIT: here's the ASM when compiled with --release.

src/use_file.rs Outdated Show resolved Hide resolved
Copy link

@matklad matklad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, noticed only a couple of unrelated nits.

src/use_file.rs Outdated Show resolved Hide resolved
src/util_libc.rs Outdated Show resolved Hide resolved
Signed-off-by: Joe Richey <joerichey@google.com>
Don't spin when polling /dev/random. We also remove the use of spin
locks when opening the persistent fd for platforms that require it.

For both these cases, we can just use the pthread lock/unlock methods
in libc. This includes adding Mutex and DropGuard abstractions.

Signed-off-by: Joe Richey <joerichey@google.com>
We no longer use spin-locks anywhere in getrandom, so remove any
interfaces which spin.

Signed-off-by: Joe Richey <joerichey@google.com>
@josephlr
Copy link
Member Author

josephlr commented Jan 6, 2020

@dhardy @newpavlov This PR is ready for review/merging. Comments have been addressed, and the CI is passing.

get_rng_fd is the main functional change, most of the remaining changes are either nits or code moving around. I'd either "rebase" or "merge" this PR as to not lose the commit history (where I tried to split up this change).

Copy link
Member

@newpavlov newpavlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! I have only one question and if @dhardy does not have any objections I will do the merge.

// before returning, making sure we don't violate the pthread_mutex_t API.
static MUTEX: Mutex = Mutex::new();
unsafe { MUTEX.lock() };
let _guard = DropGuard(|| unsafe { MUTEX.unlock() });
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it guaranteed that destructor will run right before function exit (be it return or panic)? For some reason I thought that with NLL drop place can change depending on liveness analysis.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NLL (and its future extensions w/ Polonious or what not) is just for Lifetimes. Drop is still scope based, so the drop method is always executed when the object leaves scope. As the guard is at function scope, it drops on return or on panic unwinding.

The liveness analysis is just used to determine if a set of borrows are permitted. This is why (modulo compiler bugs) NLL was introduced without it being a breaking change.

Docs: https://doc.rust-lang.org/book/ch15-03-drop.html

Copy link
Member

@dhardy dhardy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question above, otherwise looks good to me.

}
unsafe fn lock(&self) {
let r = libc::pthread_mutex_lock(self.0.get());
debug_assert_eq!(r, 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we not check the return value in all builds? (Also unlock.)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stdlib also uses debug_assert, and man page specifically says that lock/unlock basically can't fail.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha. Closer reading of the possible error values shows that none should be applicable in this case, and if any were that would be a code error, so okay. (Assuming EAGAIN is specific to recursive locks.)


unsafe impl Sync for Mutex {}

struct DropGuard<F: FnMut()>(F);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised this isn't a part of the core library!

Copy link

@matklad matklad Jan 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Often times, this does not work in higher level code, because closure borrows the environment, and so you can't modify it. We only able to use it because we close over low-level file descriptor, which is Copy.

To be clear, it is an oftentimes useful utility, but it is not as useful in practice as it could seem.

EDIT: you can, of course, make a guard smart-pointer with DerefMut (https://docs.rs/scopeguard/1.0.0/scopeguard/#scope-guard-with-value), but that's a more complex API with some design choices.

@newpavlov newpavlov merged commit c5e2025 into rust-random:master Jan 7, 2020
@josephlr josephlr deleted the spin branch January 7, 2020 19:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants